Language/OS - Multiplatform Resource Library

home *** CD-ROM | disk | FTP | other *** search

/ Language/OS - Multiplatform Resource Library / LANGUAGE OS.iso / pcl / docs.lha / cmu-user / cmu-user.info-5 < prev next >

Wrap

Text File | 1992-08-05 | 53KB | 1,453 lines

Info file: cmu-user.info, -*-Text-*- produced by latexinfo-format-buffer from file: cmu-user.tex File: cmu-user.info Node: Global Function Type Inference, Prev: Local Function Type Inference, Up: Type Inference, Next: Operation Specific Type Inference Global Function Type Inference ------------------------------ As described in section *Note Function Types::, a global function type (ftype) declaration places implicit type assertions on the call arguments, and also guarantees the type of the return value. So wherever a call to a declared function appears, there is no doubt as to the types of the arguments and return value. Furthermore, Python will infer a function type from the function's definition if there is no `ftype' declaration. Any type declarations on the argument variables are used as the argument types in the derived function type, and the compiler's best guess for the result type of the function is used as the result type in the derived function type. This method of deriving function types from the definition implicitly assumes that functions won't be redefined at run-time. Consider this example: (defun foo-p (x) (let ((res (and (consp x) (eq (car x) 'foo)))) (format t "It is ~:[not ~;~]foo." res))) (defun frob (it) (if (foo-p it) (setf (cadr it) 'yow!) (1+ it))) Presumably, the programmer really meant to return `res' from `foo-p', but he seems to have forgotten. When he tries to call do `(frob (list 'foo nil))', `frob' will flame out when it tries to add to a `cons'. Realizing his error, he fixes `foo-p' and recompiles it. But when he retries his test case, he is baffled because the error is still there. What happened in this example is that Python proved that the result of `foo-p' is `null', and then proceeded to optimize away the `setf' in `frob'. Fortunately, in this example, the error is detected at compile time due to notes about unreachable code (?.) Still, some users may not want to worry about this sort of problem during incremental development, so there is a variable to control deriving function types. -- Variable: *derive-function-types* If true (the default), argument and result type information derived from compilation of `defun's is used when compiling calls to that function. If false, only information from `ftype' proclamations will be used. File: cmu-user.info Node: Operation Specific Type Inference, Prev: Global Function Type Inference, Up: Type Inference, Next: Dynamic Type Inference Operation Specific Type Inference --------------------------------- Many of the standard CMU Common Lisp functions have special type inference procedures that determine the result type as a function of the argument types. For example, the result type of `aref' is the array element type. Here are some other examples of type inferences: (logand x #xFF) => (unsigned-byte 8) (+ (the (integer 0 12) x) (the (integer 0 1) y)) => (integer 0 13) (ash (the (unsigned-byte 16) x) -8) => (unsigned-byte 8) File: cmu-user.info Node: Dynamic Type Inference, Prev: Operation Specific Type Inference, Up: Type Inference, Next: Type Check Optimization Dynamic Type Inference ---------------------- Python uses flow analysis to infer types in dynamically typed programs. For example: (ecase x (list (length x)) ...) Here, the compiler knows the argument to `length' is a list, because the call to `length' is only done when `x' is a list. The most significant efficiency effect of inference from assertions is usually in type check optimization. Dynamic type inference has two inputs: explicit conditionals and implicit or explicit type assertions. Flow analysis propagates these constraints on variable type to any code that can be executed only after passing though the constraint. Explicit type constraints come from ifs where the test is either a lexical variable or a function of lexical variables and constants, where the function is either a type predicate, a numeric comparison or `eq'. If there is an `eq' (or `eql') test, then the compiler will actually substitute one argument for the other in the true branch. For example: (when (eq x :yow!) (return x)) becomes: (when (eq x :yow!) (return :yow!)) This substitution is done when one argument is a constant, or one argument has better type information than the other. This transformation reveals opportunities for constant folding or type-specific optimizations. If the test is against a constant, then the compiler can prove that the variable is not that constant value in the false branch, or `(not (member :yow!))' in the example above. This can eliminate redundant tests, for example: (if (eq x nil) ... (if x a b)) is transformed to this: (if (eq x nil) ... a) Variables appearing as `if' tests are interpreted as `(not (eq VAR nil))' tests. The compiler also converts `=' into `eql' where possible. It is difficult to do inference directly on `=' since it does implicit coercions. When there is an explicit `<' or `>' test on integer variables, the compiler makes inferences about the ranges the variables can assume in the true and false branches. This is mainly useful when it proves that the values are small enough in magnitude to allow open-coding of arithmetic operations. For example, in many uses of `dotimes' with a `fixnum' repeat count, the compiler proves that fixnum arithmetic can be used. Implicit type assertions are quite common, especially if you declare function argument types. Dynamic inference from implicit type assertions sometimes helps to disambiguate programs to a useful degree, but is most noticeable when it detects a dynamic type error. For example: (defun foo (x) (+ (car x) x)) results in this warning: In: DEFUN FOO (+ (CAR X) X) ==> X Warning: Result is a LIST, not a NUMBER. Note that Common Lisp's dynamic type checking semantics make dynamic type inference useful even in programs that aren't really dynamically typed, for example: (+ (car x) (length x)) Here, `x' presumably always holds a list, but in the absence of a declaration the compiler cannot assume `x' is a list simply because list-specific operations are sometimes done on it. The compiler must consider the program to be dynamically typed until it proves otherwise. Dynamic type inference proves that the argument to `length' is always a list because the call to `length' is only done after the list-specific `car' operation. File: cmu-user.info Node: Type Check Optimization, Prev: Dynamic Type Inference, Up: Type Inference Type Check Optimization ----------------------- Python backs up its support for precise type checking by minimizing the cost of run-time type checking. This is done both through type inference and though optimizations of type checking itself. Type inference often allows the compiler to prove that a value is of the correct type, and thus no type check is necessary. For example: (defstruct foo a b c) (defstruct link (foo (required-argument) :type foo) (next nil :type (or link null))) (foo-a (link-foo x)) Here, there is no need to check that the result of `link-foo' is a `foo', since it always is. Even when some type checks are necessary, type inference can often reduce the number: (defun test (x) (let ((a (foo-a x)) (b (foo-b x)) (c (foo-c x))) ...)) In this example, only one `(foo-p x)' check is needed. This applies to a lesser degree in list operations, such as: (if (eql (car x) 3) (cdr x) y) Here, we only have to check that `x' is a list once. Since Python recognizes explicit type tests, code that explicitly protects itself against type errors has little introduced overhead due to implicit type checking. For example, this loop compiles with no implicit checks checks for `car' and `cdr': (defun memq (e l) (do ((current l (cdr current))) ((atom current) nil) (when (eq (car current) e) (return current)))) Python reduces the cost of checks that must be done through an optimization called COMPLEMENTING. A complemented check for TYPE is simply a check that the value is not of the type `(not TYPE)'. This is only interesting when something is known about the actual type, in which case we can test for the complement of `(and KNOWN-TYPE (not TYPE))', or the difference between the known type and the assertion. An example: (link-foo (link-next x)) Here, we change the type check for `link-foo' from a test for `foo' to a test for: (not (and (or foo null) (not foo))) or more simply `(not null)'. This is probably the most important use of complementing, since the situation is fairly common, and a `null' test is much cheaper than a structure type test. Here is a more complicated example that illustrates the combination of complementing with dynamic type inference: (defun find-a (a x) (declare (type (or link null) x)) (do ((current x (link-next current))) ((null current) nil) (let ((foo (link-foo current))) (when (eq (foo-a foo) a) (return foo))))) This loop can be compiled with no type checks. The `link' test for `link-foo' and `link-next' is complemented to `(not null)', and then deleted because of the explicit `null' test. As before, no check is necessary for `foo-a', since the `link-foo' is always a `foo'. This sort of situation shows how precise type checking combined with precise declarations can actually result in reduced type checking. File: cmu-user.info Node: Source Optimization, Prev: Type Inference, Up: Advanced Compiler Use and Efficiency Hints, Next: Tail Recursion Source Optimization =================== This section describes source-level transformations that Python does on programs in an attempt to make them more efficient. Although source-level optimizations can make existing programs more efficient, the biggest advantage of this sort of optimization is that it makes it easier to write efficient programs. If a clean, straightforward implementation is can be transformed into an efficient one, then there is no need for tricky and dangerous hand optimization. * Menu: * Let Optimization:: * Constant Folding:: * Unused Expression Elimination:: * Control Optimization:: * Unreachable Code Deletion:: * Multiple Values Optimization:: * Source to Source Transformation:: * Style Recommendations:: File: cmu-user.info Node: Let Optimization, Prev: Source Optimization, Up: Source Optimization, Next: Constant Folding Let Optimization ---------------- The primary optimization of let variables is to delete them when they are unnecessary. Whenever the value of a let variable is a constant, a constant variable or a constant (local or non-notinline) function, the variable is deleted, and references to the variable are replaced with references to the constant expression. This is useful primarily in the expansion of macros or inline functions, where argument values are often constant in any given call, but are in general non-constant expressions that must be bound to preserve order of evaluation. Let variable optimization eliminates the need for macros to carefully avoid spurious bindings, and also makes inline functions just as efficient as macros. A particularly interesting class of constant is a local function. Substituting for lexical variables that are bound to a function can substantially improve the efficiency of functional programming styles, for example: (let ((a #'(lambda (x) (zow x)))) (funcall a 3)) effectively transforms to: (zow 3) This transformation is done even when the function is a closure, as in: (let ((a (let ((y (zug))) #'(lambda (x) (zow x y))))) (funcall a 3)) becoming: (zow 3 (zug)) A constant variable is a lexical variable that is never assigned to, always keeping its initial value. Whenever possible, avoid setting lexical variables -- instead bind a new variable to the new value. Except for loop variables, it is almost always possible to avoid setting lexical variables. This form: (let ((x (f x))) ...) is MORE efficient than this form: (setq x (f x)) ... Setting variables makes the program more difficult to understand, both to the compiler and to the programmer. Python compiles assignments at least as efficiently as any other Common Lisp compiler, but most let optimizations are only done on constant variables. Constant variables with only a single use are also optimized away, even when the initial value is not constant. (7) (*Note Let Optimization-Footnotes::) For example, this expansion of `incf': (let ((#:g3 (+ x 1))) (setq x #:G3)) becomes: (setq x (+ x 1)) The type semantics of this transformation are more important than the elimination of the variable itself. Consider what happens when `x' is declared to be a `fixnum'; after the transformation, the compiler can compile the addition knowing that the result is a `fixnum', whereas before the transformation the addition would have to allow for fixnum overflow. Another variable optimization deletes any variable that is never read. This causes the initial value and any assigned values to be unused, allowing those expressions to be deleted if they have no side-effects. Note that a let is actually a degenerate case of local call (?), and that let optimization can be done on calls that weren't created by a let. Also, local call allows an applicative style of iteration that is totally assignment free. File: cmu-user.info Node: Let Optimization-Footnotes, Up: Let Optimization (7) The source transformation in this example doesn't represent the preservation of evaluation order implicit in the compiler's internal representation. Where necessary, the back end will reintroduce temporaries to preserve the semantics. File: cmu-user.info Node: Constant Folding, Prev: Let Optimization, Up: Source Optimization, Next: Unused Expression Elimination Constant Folding ---------------- Constant folding is an optimization that replaces a call of constant arguments with the constant result of that call. Constant folding is done on all standard functions for which it is legal. Inline expansion allows folding of any constant parts of the definition, and can be done even on functions that have side-effects. It is convenient to rely on constant folding when programming, as in this example: (defconstant limit 42) (defun foo () (... (1- limit) ...)) Constant folding is also helpful when writing macros or inline functions, since it usually eliminates the need to write a macro that special-cases constant arguments. Constant folding of a user defined function is enabled by the `extensions:constant-function' proclamation. In this example: (declaim (ext:constant-function myfun)) (defun myexp (x y) (declare (single-float x y)) (exp (* (log x) y))) ... (myexp 3.0 1.3) ... The call to `myexp' is constant-folded to `4.1711674'. File: cmu-user.info Node: Unused Expression Elimination, Prev: Constant Folding, Up: Source Optimization, Next: Control Optimization Unused Expression Elimination ----------------------------- If the value of any expression is not used, and the expression has no side-effects, then it is deleted. As with constant folding, this optimization applies most often when cleaning up after inline expansion and other optimizations. Any function declared an `extensions:constant-function' is also subject to unused expression elimination. Note that Python will eliminate parts of unused expressions known to be side-effect free, even if there are other unknown parts. For example: (let ((a (list (foo) (bar)))) (if t (zow) (raz a))) becomes: (progn (foo) (bar)) (zow) File: cmu-user.info Node: Control Optimization, Prev: Unused Expression Elimination, Up: Source Optimization, Next: Unreachable Code Deletion Control Optimization -------------------- The most important optimization of control is recognizing when an if test is known at compile time, then deleting the `if', the test expression, and the unreachable branch of the `if'. This can be considered a special case of constant folding, although the test doesn't have to be truly constant as long as it is definitely not false. Note also, that type inference propagates the result of an `if' test to the true and false branches, *Note Dynamic Type Inference::. A related `if' optimization is this transformation: (8) (*Note Control Optimization-Footnotes::) (if (if a b c) x y) into: (if a (if b x y) (if c x y)) The opportunity for this sort of optimization usually results from a conditional macro. For example: (if (not a) x y) is actually implemented as this: (if (if a nil t) x y) which is transformed to this: (if a (if nil x y) (if t x y)) which is then optimized to this: (if a y x) Note that due to Python's internal representations, the `if'--`if' situation will be recognized even if other forms are wrapped around the inner `if', like: (if (let ((g ...)) (loop ... (return (not g)) ...)) x y) In Python, all the CMU Common Lisp macros really are macros, written in terms of `if', `block' and `tagbody', so user-defined control macros can be just as efficient as the standard ones. Python emits basic blocks using a heuristic that minimizes the number of unconditional branches. The code in a `tagbody' will not be emitted in the order it appeared in the source, so there is no point in arranging the code to make control drop through to the target. File: cmu-user.info Node: Control Optimization-Footnotes, Up: Control Optimization (8) Note that the code for `x' and `y' isn't actually replicated. File: cmu-user.info Node: Unreachable Code Deletion, Prev: Control Optimization, Up: Source Optimization, Next: Multiple Values Optimization Unreachable Code Deletion ------------------------- Python will delete code whenever it can prove that the code can never be executed. Code becomes unreachable when: * An `if' is optimized away, or * There is an explicit unconditional control transfer such as `go' or `return-from', or * The last reference to a local function is deleted (or there never was any reference.) When code that appeared in the original source is deleted, the compiler prints a note to indicate a possible problem (or at least unnecessary code.) For example: (defun foo () (if t (write-line "True.") (write-line "False."))) will result in this note: In: DEFUN FOO (WRITE-LINE "False.") Note: Deleting unreachable code. It is important to pay attention to unreachable code notes, since they often indicate a subtle type error. For example: (defstruct foo a b) (defun lose (x) (let ((a (foo-a x)) (b (if x (foo-b x) :none))) ...)) results in this note: In: DEFUN LOSE (IF X (FOO-B X) :NONE) ==> :NONE Note: Deleting unreachable code. The :none is unreachable, because type inference knows that the argument to `foo-a' must be a `foo', and thus can't be false. Presumably the programmer forgot that `x' could be false when he wrote the binding for `a'. Here is an example with an incorrect declaration: (defun count-a (string) (do ((pos 0 (position #\a string :start (1+ pos))) (count 0 (1+ count))) ((null pos) count) (declare (fixnum pos)))) This time our note is: In: DEFUN COUNT-A (DO ((POS 0 #) (COUNT 0 #)) ((NULL POS) COUNT) (DECLARE (FIXNUM POS))) --> BLOCK LET TAGBODY RETURN-FROM PROGN ==> COUNT Note: Deleting unreachable code. The problem here is that `pos' can never be null since it is declared a `fixnum'. It takes some experience with unreachable code notes to be able to tell what they are trying to say. In non-obvious cases, the best thing to do is to call the function in a way that should cause the unreachable code to be executed. Either you will get a type error, or you will find that there truly is no way for the code to be executed. Not all unreachable code results in a note: * A note is only given when the unreachable code textually appears in the original source. This prevents spurious notes due to the optimization of macros and inline functions, but sometimes also foregoes a note that would have been useful. * Since accurate source information is not available for non-list forms, there is an element of heuristic in determining whether or not to give a note about an atom. Spurious notes may be given when a macro or inline function defines a variable that is also present in the calling function. Notes about false and true are never given, since it is too easy to confuse these constants in expanded code with ones in the original source. * Notes are only given about code unreachable due to control flow. There is no note when an expression is deleted because its value is unused, since this is a common consequence of other optimizations. Somewhat spurious unreachable code notes can also result when a macro inserts multiple copies of its arguments in different contexts, for example: (defmacro t-and-f (var form) `(if ,var ,form ,form)) (defun foo (x) (t-and-f x (if x "True." "False."))) results in these notes: In: DEFUN FOO (IF X "True." "False.") ==> "False." Note: Deleting unreachable code. ==> "True." Note: Deleting unreachable code. It seems like it has deleted both branches of the `if', but it has really deleted one branch in one copy, and the other branch in the other copy. Note that these messages are only spurious in not satisfying the intent of the rule that notes are only given when the deleted code appears in the original source; there is always SOME code being deleted when a unreachable code note is printed. File: cmu-user.info Node: Multiple Values Optimization, Prev: Unreachable Code Deletion, Up: Source Optimization, Next: Source to Source Transformation Multiple Values Optimization ---------------------------- Within a function, Python implements uses of multiple values particularly efficiently. Multiple values can be kept in arbitrary registers, so using multiple values doesn't imply stack manipulation and representation conversion. For example, this code: (let ((a (if x (foo x) u)) (b (if x (bar x) v))) ...) is actually more efficient written this way: (multiple-value-bind (a b) (if x (values (foo x) (bar x)) (values u v)) ...) Also, ? for information on how local call provides efficient support for multiple function return values. File: cmu-user.info Node: Source to Source Transformation, Prev: Multiple Values Optimization, Up: Source Optimization, Next: Style Recommendations Source to Source Transformation ------------------------------- The compiler implements a number of operation-specific optimizations as source-to-source transformations. You will often see unfamiliar code in error messages, for example: (defun my-zerop () (zerop x)) gives this warning: In: DEFUN MY-ZEROP (ZEROP X) ==> (= X 0) Warning: Undefined variable: X The original `zerop' has been transformed into a call to `='. This transformation is indicated with the same `==>' used to mark macro and function inline expansion. Although it can be confusing, display of the transformed source is important, since warnings are given with respect to the transformed source. This a more obscure example: (defun foo (x) (logand 1 x)) gives this efficiency note: In: DEFUN FOO (LOGAND 1 X) ==> (LOGAND C::Y C::X) Note: Forced to do static-function Two-arg-and (cost 53). Unable to do inline fixnum arithmetic (cost 1) because: The first argument is a INTEGER, not a FIXNUM. etc. Here, the compiler commuted the call to `logand', introducing temporaries. The note complains that the FIRST argument is not a `fixnum', when in the original call, it was the second argument. To make things more confusing, the compiler introduced temporaries called `c::x' and `c::y' that are bound to `y' and `1', respectively. You will also notice source-to-source optimizations when efficiency notes are enabled (?.) When the compiler is unable to do a transformation that might be possible if there was more information, then an efficiency note is printed. For example, `my-zerop' above will also give this efficiency note: In: DEFUN FOO (ZEROP X) ==> (= X 0) Note: Unable to optimize because: Operands might not be the same type, so can't open code. File: cmu-user.info Node: Style Recommendations, Prev: Source to Source Transformation, Up: Source Optimization Style Recommendations --------------------- Source level optimization makes possible a clearer and more relaxed programming style: * Don't use macros purely to avoid function call. If you want an inline function, write it as a function and declare it inline. It's clearer, less error-prone, and works just as well. * Don't write macros that try to "optimize" their expansion in trivial ways such as avoiding binding variables for simple expressions. The compiler does these optimizations too, and is less likely to make a mistake. * Make use of local functions (i.e., `labels' or `flet') and tail-recursion in places where it is clearer. Local function call is faster than full call. * Avoid setting local variables when possible. Binding a new `let' variable is at least as efficient as setting an existing variable, and is easier to understand, both for the compiler and the programmer. * Instead of writing similar code over and over again so that it can be hand customized for each use, define a macro or inline function, and let the compiler do the work. File: cmu-user.info Node: Tail Recursion, Prev: Source Optimization, Up: Advanced Compiler Use and Efficiency Hints, Next: Local Call Tail Recursion ============== A call is tail-recursive if nothing has to be done after the the call returns, i.e. when the call returns, the returned value is immediately returned from the calling function. In this example, the recursive call to `myfun' is tail-recursive: (defun myfun (x) (if (oddp (random x)) (isqrt x) (myfun (1- x)))) Tail recursion is interesting because it is form of recursion that can be implemented much more efficiently than general recursion. In general, a recursive call requires the compiler to allocate storage on the stack at run-time for every call that has not yet returned. This memory consumption makes recursion unacceptably inefficient for representing repetitive algorithms having large or unbounded size. Tail recursion is the special case of recursion that is semantically equivalent to the iteration constructs normally used to represent repetition in programs. Because tail recursion is equivalent to iteration, tail-recursive programs can be compiled as efficiently as iterative programs. So why would you want to write a program recursively when you can write it using a loop? Well, the main answer is that recursion is a more general mechanism, so it can express some solutions simply that are awkward to write as a loop. Some programmers also feel that recursion is a stylistically preferable way to write loops because it avoids assigning variables. For example, instead of writing: (defun fun1 (x) something-that-uses-x) (defun fun2 (y) something-that-uses-y) (do ((x something (fun2 (fun1 x)))) (nil)) You can write: (defun fun1 (x) (fun2 something-that-uses-x)) (defun fun2 (y) (fun1 something-that-uses-y)) (fun1 something) The tail-recursive definition is actually more efficient, in addition to being (arguably) clearer. As the number of functions and the complexity of their call graph increases, the simplicity of using recursion becomes compelling. Consider the advantages of writing a large finite-state machine with separate tail-recursive functions instead of using a single huge `prog'. It helps to understand how to use tail recursion if you think of a tail-recursive call as a `psetq' that assigns the argument values to the called function's variables, followed by a `go' to the start of the called function. This makes clear an inherent efficiency advantage of tail-recursive call: in addition to not having to allocate a stack frame, there is no need to prepare for the call to return (e.g., by computing a return PC.) Is there any disadvantage to tail recursion? Other than an increase in efficiency, the only way you can tell that a call has been compiled tail-recursively is if you use the debugger. Since a tail-recursive call has no stack frame, there is no way the debugger can print out the stack frame representing the call. The effect is that backtrace will not show some calls that would have been displayed in a non-tail-recursive implementation. In practice, this is not as bad as it sounds -- in fact it isn't really clearly worse, just different. *Note Debug Tail Recursion:: for information about the debugger implications of tail recursion. In order to ensure that tail-recursion is preserved in arbitrarily complex calling patterns across separately compiled functions, the compiler must compile any call in a tail-recursive position as a tail-recursive call. This is done regardless of whether the program actually exhibits any sort of recursive calling pattern. In this example, the call to `fun2' will always be compiled as a tail-recursive call: (defun fun1 (x) (fun2 x)) So tail recursion doesn't necessarily have anything to do with recursion as it is normally thought of. ? for more discussion of using tail recursion to implement loops. * Menu: * Tail Recursion Exceptions:: File: cmu-user.info Node: Tail Recursion Exceptions, Prev: Tail Recursion, Up: Tail Recursion Tail Recursion Exceptions ------------------------- Although Python is claimed to be "properly" tail-recursive, some might dispute this, since there are situations where tail recursion is inhibited: * When the call is enclosed by a special binding, or * When the call is enclosed by a `catch' or `unwind-protect', or * When the call is enclosed by a `block' or `tagbody' and the block name or `go' tag has been closed over. These dynamic extent binding forms inhibit tail recursion because they allocate stack space to represent the binding. Shallow-binding implementations of dynamic scoping also require cleanup code to be evaluated when the scope is exited. File: cmu-user.info Node: Local Call, Prev: Tail Recursion, Up: Advanced Compiler Use and Efficiency Hints, Next: Block Compilation Local Call ========== Python supports two kinds of function call: full call and local call. Full call is the standard calling convention; its late binding and generality make Common Lisp what it is, but create unavoidable overheads. When the compiler can compile the calling function and the called function simultaneously, it can use local call to avoid some of the overhead of full call. Local call is really a collection of compilation strategies. If some aspect of call overhead is not needed in a particular local call, then it can be omitted. In some cases, local call can be totally free. Local call provides two main advantages to the user: * Local call makes the use of the lexical function binding forms flet and labels much more efficient. A local call is always faster than a full call, and in many cases is much faster. * Local call is a natural approach to block compilation, a compilation technique that resolves function references at compile time. Block compilation speeds function call, but increases compilation times and prevents function redefinition. * Menu: * Self-Recursive Calls:: * Let Calls:: * Closures:: * Local Tail Recursion:: * Return Values:: File: cmu-user.info Node: Self-Recursive Calls, Prev: Local Call, Up: Local Call, Next: Let Calls Self-Recursive Calls -------------------- Local call is used when a function defined by `defun' calls itself. For example: (defun fact (n) (if (zerop n) 1 (* n (fact (1- n))))) This use of local call speeds recursion, but can also complicate debugging, since trace will only show the first call to `fact', and not the recursive calls. This is because the recursive calls directly jump to the start of the function, and don't indirect through the `symbol-function'. Self-recursive local call is inhibited when the :block-compile argument to `compile-file' is false (?.) File: cmu-user.info Node: Let Calls, Prev: Self-Recursive Calls, Up: Local Call, Next: Closures Let Calls --------- Because local call avoids unnecessary call overheads, the compiler internally uses local call to implement some macros and special forms that are not normally thought of as involving a function call. For example, this `let': (let ((a (foo)) (b (bar))) ...) is internally represented as though it was macroexpanded into: (funcall #'(lambda (a b) ...) (foo) (bar)) This implementation is acceptable because the simple cases of local call (equivalent to a `let') result in good code. This doesn't make `let' any more efficient, but does make local calls that are semantically the same as `let' much more efficient than full calls. For example, these definitions are all the same as far as the compiler is concerned: (defun foo () ...some other stuff... (let ((a something)) ...some stuff...)) (defun foo () (flet ((localfun (a) ...some stuff...)) ...some other stuff... (localfun something))) (defun foo () (let ((funvar #'(lambda (a) ...some stuff...))) ...some other stuff... (funcall funvar something))) Although local call is most efficient when the function is called only once, a call doesn't have to be equivalent to a `let' to be more efficient than full call. All local calls avoid the overhead of argument count checking and keyword argument parsing, and there are a number of other advantages that apply in many common situations. *Note Let Optimization:: for a discussion of the optimizations done on let calls. File: cmu-user.info Node: Closures, Prev: Let Calls, Up: Local Call, Next: Local Tail Recursion Closures -------- Local call allows for much more efficient use of closures, since the closure environment doesn't need to be allocated on the heap, or even stored in memory at all. In this example, there is no penalty for `localfun' referencing `a' and `b': (defun foo (a b) (flet ((localfun (x) (1+ (* a b x)))) (if (= a b) (localfun (- x)) (localfun x)))) In local call, the compiler effectively passes closed-over values as extra arguments, so there is no need for you to "optimize" local function use by explicitly passing in lexically visible values. Closures may also be subject to let optimization (*Note Let Optimization::.) Note: indirect value cells are currently always allocated on the heap when a variable is both assigned to (with `setq' or `setf') and closed over, regardless of whether the closure is a local function or not. This is another reason to avoid setting variables when you don't have to. File: cmu-user.info Node: Local Tail Recursion, Prev: Closures, Up: Local Call, Next: Return Values Local Tail Recursion -------------------- Tail-recursive local calls are particularly efficient, since they are in effect an assignment plus a control transfer. Scheme programmers write loops with tail-recursive local calls, instead of using the imperative `go' and `setq'. This has not caught on in the CMU Common Lisp community, since conventional Common Lisp compilers don't implement local call. In Python, users can choose to write loops such as: (defun ! (n) (labels ((loop (n total) (if (zerop n) total (loop (1- n) (* n total))))) (loop n 1))) -- Macro: iterate NAME ({(VAR INITIAL-VALUE)}*) {DECLARATION}* {FORM}* This macro provides syntactic sugar for using labels to do iteration. It creates a local function NAME with the specified VARs as its arguments and the DECLARATIONs and FORMs as its body. This function is then called with the INITIAL-VALUES, and the result of the call is return from the macro. Here is our factorial example rewritten using `iterate': (defun ! (n) (iterate loop ((n n) (total 1)) (if (zerop n) total (loop (1- n) (* n total))))) The main advantage of using `iterate' over `do' is that `iterate' naturally allows stepping to be done differently depending on conditionals in the body of the loop. `iterate' can also be used to implement algorithms that aren't really iterative by simply doing a non-tail call. For example, the standard recursive definition of factorial can be written like this: (iterate fact ((n n)) (if (zerop n) 1 (* n (fact (1- n))))) File: cmu-user.info Node: Return Values, Prev: Local Tail Recursion, Up: Local Call Return Values ------------- One of the more subtle costs of full call comes from allowing arbitrary numbers of return values. This overhead can be avoided in local calls to functions that always return the same number of values. For efficiency reasons (as well as stylistic ones), you should write functions so that they always return the same number of values. This may require passing extra false arguments to `values' in some cases, but the result is more efficient, not less so. When efficiency notes are enabled (?), and the compiler wants to use known values return, but can't prove that the function always returns the same number of values, then it will print a note like this: In: DEFUN GRUE (DEFUN GRUE (X) (DECLARE (FIXNUM X)) (COND (# #) (# NIL) (T #))) Note: Return type not fixed values, so can't use known return convention: (VALUES (OR (INTEGER -536870912 -1) NULL) &REST T) In order to implement proper tail recursion in the presence of known values return (*Note Tail Recursion::), the compiler sometimes must prove that multiple functions all return the same number of values. When this can't be proven, the compiler will print a note like this: In: DEFUN BLUE (DEFUN BLUE (X) (DECLARE (FIXNUM X)) (COND (# #) (# #) (# #) (T #))) Note: Return value count mismatch prevents known return from these functions: BLUE SNOO ? for the interaction between local call and the representation of numeric types. File: cmu-user.info Node: Block Compilation, Prev: Local Call, Up: Advanced Compiler Use and Efficiency Hints, Next: Inline Expansion Block Compilation ================= Block compilation allows calls to global functions defined by defun to be compiled as local calls. The function call can be in a different top-level form than the `defun', or even in a different file. In addition, block compilation allows the declaration of the entry points to the block compiled portion. An entry point is any function that may be called from outside of the block compilation. If a function is not an entry point, then it can be compiled more efficiently, since all calls are known at compile time. In particular, if a function is only called in one place, then it will be let converted. This effectively inline expands the function, but without the code duplication that results from defining the function normally and then declaring it inline. The main advantage of block compilation is that it it preserves efficiency in programs even when (for readability and syntactic convenience) they are broken up into many small functions. There is absolutely no overhead for calling a non-entry point function that is defined purely for modularity (i.e. called only in one place.) Block compilation also allows the use of non-descriptor arguments and return values in non-trivial programs (?). * Menu: * Block Compilation Semantics:: * Block Compilation Declarations:: * Compiler Arguments:: * Practical Difficulties:: File: cmu-user.info Node: Block Compilation Semantics, Prev: Block Compilation, Up: Block Compilation, Next: Block Compilation Declarations Block Compilation Semantics --------------------------- The effect of block compilation can be envisioned as the compiler turning all the `defun's in the block compilation into a single `labels' form: (declaim (start-block fun1 fun3)) (defun fun1 () ...) (defun fun2 () ... (fun1) ...) (defun fun3 (x) (if x (fun1) (fun2))) (declaim (end-block)) becomes: (labels ((fun1 () ...) (fun2 () ... (fun1) ...) (fun3 (x) (if x (fun1) (fun2)))) (setf (fdefinition 'fun1) #'fun1) (setf (fdefinition 'fun3) #'fun3)) Calls between the block compiled functions are local calls, so changing the global definition of `fun1' will have no effect on what `fun2' does; `fun2' will keep calling the old `fun1'. The entry points `fun1' and `fun3' are still installed in the `symbol-function' as the global definitions of the functions, so a full call to an entry point works just as before. However, `fun2' is not an entry point, so it is not globally defined. In addition, `fun2' is only called in one place, so it will be let converted. File: cmu-user.info Node: Block Compilation Declarations, Prev: Block Compilation Semantics, Up: Block Compilation, Next: Compiler Arguments Block Compilation Declarations ------------------------------ The `extensions:start-block' and `extensions:end-block' declarations allow fine-grained control of block compilation. These declarations are only legal as a global declarations (`declaim' or `proclaim'). The `start-block' declaration has this syntax: (start-block {ENTRY-POINT-NAME}*) When processed by the compiler, this declaration marks the start of block compilation, and specifies the entry points to that block. If no entry points are specified, then ALL functions are made into entry points. If already block compiling, then the compiler ends the current block and starts a new one. The `end-block' declaration has no arguments: (end-block) The `end-block' declaration ends a block compilation unit without starting a new one. This is useful mainly when only a portion of a file is worth block compiling. File: cmu-user.info Node: Compiler Arguments, Prev: Block Compilation Declarations, Up: Block Compilation, Next: Practical Difficulties Compiler Arguments ------------------ The :block-compile and :entry-points arguments to `extensions:compile-from-stream' and compile-file *Note Calling the Compiler:: provide overall control of block compilation, and allow block compilation without requiring modification of the program source. There are three possible values of the :block-compile argument: false Do no compile-time resolution of global function names, not even for self-recursive calls. This inhibits any `start-block' declarations appearing in the file, allowing all functions to be incrementally redefined. true Start compiling in block compilation mode. This is mainly useful for block compiling small files that contain no `start-block' declarations. See also the :entry-points argument. :specified Start compiling in form-at-a-time mode, but exploit `start-block' declarations and compile self-recursive calls as local calls. Normally :specified is the default for this argument (see block-compile-default ?.) The :entry-points argument can be used in conjunction with :block-compile true to specify the entry-points to a block-compiled file. If not specified or nil, all global functions will be compiled as entry points. When :block-compile is not true, this argument is ignored. -- Variable: *block-compile-default* This variable determines the default value for the :block-compile argument to `compile-file' and `compile-from-stream'. The initial value of this variable is :specified, but false is sometimes useful for totally inhibiting block compilation. File: cmu-user.info Node: Practical Difficulties, Prev: Compiler Arguments, Up: Block Compilation Practical Difficulties ---------------------- The main problem with block compilation is that the compiler uses large amounts of memory when it is block compiling. This places an upper limit on the amount of code that can be block compiled as a unit. To make best use of block compilation, it is necessary to locate the parts of the program containing many internal calls, and then add the appropriate `start-block' declarations. When writing new code, it is a good idea to put in block compilation declarations from the very beginning, since writing block declarations correctly requires accurate knowledge of the program's function call structure. If you want to initially develop code with full incremental redefinition, you can compile with block-compile-default *Note Compiler Arguments:: set to false. Note if a `defun' appears in a non-null lexical environment, then calls to it cannot be block compiled. Unless files are very small, it is probably impractical to block compile multiple files as a unit by specifying a list of files to `compile-file'. Semi-inline expansion (?) provides another way to extend block compilation across file boundaries. File: cmu-user.info Node: Inline Expansion, Prev: Block Compilation, Up: Advanced Compiler Use and Efficiency Hints, Next: Object Representation Inline Expansion ================ Python can expand almost any function inline, including functions with keyword arguments. The only restrictions are that keyword argument keywords in the call must be constant, and that global function definitions (`defun') must be done in a null lexical environment (not nested in a `let' or other binding form.) Local functions (`flet') can be inline expanded in any environment. Combined with Python's source-level optimization, inline expansion can be used for things that formerly required macros for efficient implementation. In Python, macros don't have any efficiency advantage, so they need only be used where a macro's syntactic flexibility is required. Inline expansion is a compiler optimization technique that reduces the overhead of a function call by simply not doing the call: instead, the compiler effectively rewrites the program to appear as though the definition of the called function was inserted at each call site. In Common Lisp, this is straightforwardly expressed by inserting the `lambda' corresponding to the original definition: (proclaim '(inline my-1+)) (defun my-1+ (x) (+ x 1)) (my-1+ someval) => ((lambda (x) (+ x 1)) someval) When the function expanded inline is large, the program after inline expansion may be substantially larger than the original program. If the program becomes too large, inline expansion hurts speed rather than helping it, since hardware resources such as physical memory and cache will be exhausted. Inline expansion is called for: * When profiling has shown that a relatively simple function is called so often that a large amount of time is being wasted in the calling of that function (as opposed to running in that function.) If a function is complex, it will take a long time to run relative the time spent in call, so the speed advantage of inline expansion is diminished at the same time the space cost of inline expansion is increased. Of course, if a function is rarely called, then the overhead of calling it is also insignificant. * With functions so simple that they take less space to inline expand than would be taken to call the function (such as `my-1+' above.) It would require intimate knowledge of the compiler to be certain when inline expansion would reduce space, but it is generally safe to inline expand functions whose definition is a single function call, or a few calls to simple CMU Common Lisp functions. In addition to this speed/space tradeoff from inline expansion's avoidance of the call, inline expansion can also reveal opportunities for optimization. Python's extensive source-level optimization can make use of context information from the caller to tremendously simplify the code resulting from the inline expansion of a function. The main form of caller context is local information about the actual argument values: what the argument types are and whether the arguments are constant. Knowledge about argument types can eliminate run-time type tests (e.g., for generic arithmetic.) Constant arguments in a call provide opportunities for constant folding optimization after inline expansion. A hidden way that constant arguments are often supplied to functions is through the defaulting of unsupplied optional or keyword arguments. There can be a huge efficiency advantage to inline expanding functions that have complex keyword-based interfaces, such as this definition of the `member' function: (proclaim '(inline member)) (defun member (item list &key (key #'identity) (test #'eql testp) (test-not nil notp)) (do ((list list (cdr list))) ((null list) nil) (let ((car (car list))) (if (cond (testp (funcall test item (funcall key car))) (notp (not (funcall test-not item (funcall key car)))) (t (funcall test item (funcall key car)))) (return list))))) After inline expansion, this call is simplified to the obvious code: (member a l :key #'foo-a :test #'char=) => (do ((list list (cdr list))) ((null list) nil) (let ((car (car list))) (if (char= item (foo-a car)) (return list)))) In this example, there could easily be more than an order of magnitude improvement in speed. In addition to eliminating the original call to `member', inline expansion also allows the calls to `char=' and `foo-a' to be open-coded. We go from a loop with three tests and two calls to a loop with one test and no calls. *Note Source Optimization:: for more discussion of source level optimization. * Menu: * Inline Expansion Recording:: * Semi-Inline Expansion:: * The Maybe-Inline Declaration::